Dataset statistics
| Number of variables | 13 |
|---|---|
| Number of observations | 3832 |
| Missing cells | 0 |
| Missing cells (%) | 0.0% |
| Duplicate rows | 0 |
| Duplicate rows (%) | 0.0% |
| Total size in memory | 389.3 KiB |
| Average record size in memory | 104.0 B |
Variable types
| Numeric | 9 |
|---|---|
| Categorical | 4 |
city is highly correlated with city_development_index | High correlation |
city_development_index is highly correlated with city | High correlation |
relevent_experience is highly correlated with last_new_job | High correlation |
last_new_job is highly correlated with relevent_experience | High correlation |
df_index has unique values | Unique |
major_discipline has 53 (1.4%) zeros | Zeros |
experience has 107 (2.8%) zeros | Zeros |
company_size has 371 (9.7%) zeros | Zeros |
company_type has 137 (3.6%) zeros | Zeros |
last_new_job has 1699 (44.3%) zeros | Zeros |
Reproduction
| Analysis started | 2022-02-20 15:11:52.088152 |
|---|---|
| Analysis finished | 2022-02-20 15:12:09.183580 |
| Duration | 17.1 seconds |
| Software version | pandas-profiling v3.1.0 |
| Download configuration | config.json |
| Distinct | 3832 |
|---|---|
| Distinct (%) | 100.0% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 9646.476253 |
| Minimum | 2 |
|---|---|
| Maximum | 19155 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 30.1 KiB |
Quantile statistics
| Minimum | 2 |
|---|---|
| 5-th percentile | 1044.75 |
| Q1 | 4938.25 |
| median | 9613.5 |
| Q3 | 14413.25 |
| 95-th percentile | 18234.45 |
| Maximum | 19155 |
| Range | 19153 |
| Interquartile range (IQR) | 9475 |
Descriptive statistics
| Standard deviation | 5491.503126 |
|---|---|
| Coefficient of variation (CV) | 0.5692755553 |
| Kurtosis | -1.18122303 |
| Mean | 9646.476253 |
| Median Absolute Deviation (MAD) | 4742.5 |
| Skewness | -0.006484189091 |
| Sum | 36965297 |
| Variance | 30156606.58 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 686 | 1 | < 0.1% |
| 7869 | 1 | < 0.1% |
| 15302 | 1 | < 0.1% |
| 4882 | 1 | < 0.1% |
| 3713 | 1 | < 0.1% |
| 18621 | 1 | < 0.1% |
| 5923 | 1 | < 0.1% |
| 11592 | 1 | < 0.1% |
| 3832 | 1 | < 0.1% |
| 15667 | 1 | < 0.1% |
| Other values (3822) | 3822 |
| Value | Count | Frequency (%) |
| 2 | 1 | |
| 9 | 1 | |
| 10 | 1 | |
| 11 | 1 | |
| 15 | 1 | |
| 17 | 1 | |
| 21 | 1 | |
| 24 | 1 | |
| 26 | 1 | |
| 32 | 1 |
| Value | Count | Frequency (%) |
| 19155 | 1 | |
| 19149 | 1 | |
| 19135 | 1 | |
| 19132 | 1 | |
| 19129 | 1 | |
| 19128 | 1 | |
| 19126 | 1 | |
| 19124 | 1 | |
| 19123 | 1 | |
| 19122 | 1 |
| Distinct | 117 |
|---|---|
| Distinct (%) | 3.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 44.16310021 |
| Minimum | 0 |
|---|---|
| Maximum | 122 |
| Zeros | 8 |
| Zeros (%) | 0.2% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 30.1 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 5 |
| Q1 | 5 |
| median | 48 |
| Q3 | 64 |
| 95-th percentile | 103 |
| Maximum | 122 |
| Range | 122 |
| Interquartile range (IQR) | 59 |
Descriptive statistics
| Standard deviation | 35.27668561 |
|---|---|
| Coefficient of variation (CV) | 0.7987819117 |
| Kurtosis | -1.048270934 |
| Mean | 44.16310021 |
| Median Absolute Deviation (MAD) | 35 |
| Skewness | 0.3802565906 |
| Sum | 169233 |
| Variance | 1244.444548 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 5 | 862 | |
| 64 | 574 | |
| 48 | 285 | 7.4% |
| 13 | 274 | 7.2% |
| 49 | 166 | 4.3% |
| 30 | 127 | 3.3% |
| 95 | 84 | 2.2% |
| 4 | 61 | 1.6% |
| 6 | 58 | 1.5% |
| 99 | 58 | 1.5% |
| Other values (107) | 1283 |
| Value | Count | Frequency (%) |
| 0 | 8 | 0.2% |
| 1 | 17 | 0.4% |
| 2 | 55 | 1.4% |
| 3 | 19 | 0.5% |
| 4 | 61 | 1.6% |
| 5 | 862 | |
| 6 | 58 | 1.5% |
| 7 | 14 | 0.4% |
| 8 | 3 | 0.1% |
| 9 | 1 | < 0.1% |
| Value | Count | Frequency (%) |
| 122 | 15 | 0.4% |
| 121 | 9 | 0.2% |
| 120 | 16 | 0.4% |
| 119 | 4 | 0.1% |
| 118 | 7 | 0.2% |
| 117 | 9 | 0.2% |
| 116 | 44 | |
| 115 | 1 | < 0.1% |
| 114 | 11 | 0.3% |
| 113 | 4 | 0.1% |
| Distinct | 88 |
|---|---|
| Distinct (%) | 2.3% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 62.11482255 |
| Minimum | 0 |
|---|---|
| Maximum | 92 |
| Zeros | 2 |
| Zeros (%) | 0.1% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 30.1 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 14 |
| Q1 | 27 |
| median | 80 |
| Q3 | 85 |
| 95-th percentile | 90 |
| Maximum | 92 |
| Range | 92 |
| Interquartile range (IQR) | 58 |
Descriptive statistics
| Standard deviation | 29.56445109 |
|---|---|
| Coefficient of variation (CV) | 0.4759645101 |
| Kurtosis | -1.126743255 |
| Mean | 62.11482255 |
| Median Absolute Deviation (MAD) | 9 |
| Skewness | -0.760914858 |
| Sum | 238024 |
| Variance | 874.056768 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 85 | 1028 | |
| 14 | 574 | |
| 82 | 285 | 7.4% |
| 90 | 274 | 7.2% |
| 27 | 137 | 3.6% |
| 78 | 127 | 3.3% |
| 91 | 86 | 2.2% |
| 67 | 84 | 2.2% |
| 57 | 61 | 1.6% |
| 88 | 58 | 1.5% |
| Other values (78) | 1118 |
| Value | Count | Frequency (%) |
| 0 | 2 | 0.1% |
| 1 | 4 | 0.1% |
| 3 | 2 | 0.1% |
| 4 | 1 | < 0.1% |
| 5 | 1 | < 0.1% |
| 6 | 1 | < 0.1% |
| 7 | 17 | 0.4% |
| 8 | 52 | |
| 9 | 16 | 0.4% |
| 10 | 4 | 0.1% |
| Value | Count | Frequency (%) |
| 92 | 9 | 0.2% |
| 91 | 86 | 2.2% |
| 90 | 274 | 7.2% |
| 89 | 27 | 0.7% |
| 88 | 58 | 1.5% |
| 87 | 34 | 0.9% |
| 86 | 1 | < 0.1% |
| 85 | 1028 | |
| 84 | 15 | 0.4% |
| 83 | 42 | 1.1% |
gender
Categorical
| Distinct | 3 |
|---|---|
| Distinct (%) | 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 224.7 KiB |
| 1.0 | |
|---|---|
| 0.0 | |
| 2.0 | 41 |
Length
| Max length | 3 |
|---|---|
| Median length | 3 |
| Mean length | 3 |
| Min length | 3 |
Characters and Unicode
| Total characters | 0 |
|---|---|
| Distinct characters | 0 |
| Distinct categories | 0 ? |
| Distinct scripts | 0 ? |
| Distinct blocks | 0 ? |
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | 1.0 |
|---|---|
| 2nd row | 0.0 |
| 3rd row | 1.0 |
| 4th row | 1.0 |
| 5th row | 1.0 |
Common Values
| Value | Count | Frequency (%) |
| 1.0 | 3419 | |
| 0.0 | 372 | 9.7% |
| 2.0 | 41 | 1.1% |
Length
Histogram of lengths of the category
Pie chart
| Value | Count | Frequency (%) |
| 1.0 | 3419 | |
| 0.0 | 372 | 9.7% |
| 2.0 | 41 | 1.1% |
Most occurring characters
| Value | Count | Frequency (%) |
| No values found. | ||
Most occurring categories
| Value | Count | Frequency (%) |
| No values found. | ||
Most frequent character per category
Most occurring scripts
| Value | Count | Frequency (%) |
| No values found. | ||
Most frequent character per script
Most occurring blocks
| Value | Count | Frequency (%) |
| No values found. | ||
Most frequent character per block
| Distinct | 2 |
|---|---|
| Distinct (%) | 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 217.2 KiB |
| 0 | |
|---|---|
| 1 |
Length
| Max length | 1 |
|---|---|
| Median length | 1 |
| Mean length | 1 |
| Min length | 1 |
Characters and Unicode
| Total characters | 0 |
|---|---|
| Distinct characters | 0 |
| Distinct categories | 0 ? |
| Distinct scripts | 0 ? |
| Distinct blocks | 0 ? |
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | 0 |
|---|---|
| 2nd row | 1 |
| 3rd row | 0 |
| 4th row | 0 |
| 5th row | 0 |
Common Values
| Value | Count | Frequency (%) |
| 0 | 2754 | |
| 1 | 1078 | 28.1% |
Length
Histogram of lengths of the category
Pie chart
| Value | Count | Frequency (%) |
| 0 | 2754 | |
| 1 | 1078 | 28.1% |
Most occurring characters
| Value | Count | Frequency (%) |
| No values found. | ||
Most occurring categories
| Value | Count | Frequency (%) |
| No values found. | ||
Most frequent character per category
Most occurring scripts
| Value | Count | Frequency (%) |
| No values found. | ||
Most frequent character per script
Most occurring blocks
| Value | Count | Frequency (%) |
| No values found. | ||
Most frequent character per block
enrolled_university
Categorical
| Distinct | 3 |
|---|---|
| Distinct (%) | 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 224.7 KiB |
| 2.0 | |
|---|---|
| 0.0 | |
| 1.0 | 241 |
Length
| Max length | 3 |
|---|---|
| Median length | 3 |
| Mean length | 3 |
| Min length | 3 |
Characters and Unicode
| Total characters | 0 |
|---|---|
| Distinct characters | 0 |
| Distinct categories | 0 ? |
| Distinct scripts | 0 ? |
| Distinct blocks | 0 ? |
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | 2.0 |
|---|---|
| 2nd row | 0.0 |
| 3rd row | 2.0 |
| 4th row | 2.0 |
| 5th row | 2.0 |
Common Values
| Value | Count | Frequency (%) |
| 2.0 | 2795 | |
| 0.0 | 796 | 20.8% |
| 1.0 | 241 | 6.3% |
Length
Histogram of lengths of the category
Pie chart
| Value | Count | Frequency (%) |
| 2.0 | 2795 | |
| 0.0 | 796 | 20.8% |
| 1.0 | 241 | 6.3% |
Most occurring characters
| Value | Count | Frequency (%) |
| No values found. | ||
Most occurring categories
| Value | Count | Frequency (%) |
| No values found. | ||
Most frequent character per category
Most occurring scripts
| Value | Count | Frequency (%) |
| No values found. | ||
Most frequent character per script
Most occurring blocks
| Value | Count | Frequency (%) |
| No values found. | ||
Most frequent character per block
education_level
Categorical
| Distinct | 5 |
|---|---|
| Distinct (%) | 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 224.7 KiB |
| 0.0 | |
|---|---|
| 2.0 | |
| 1.0 | |
| 3.0 | 66 |
| 4.0 | 64 |
Length
| Max length | 3 |
|---|---|
| Median length | 3 |
| Mean length | 3 |
| Min length | 3 |
Characters and Unicode
| Total characters | 0 |
|---|---|
| Distinct characters | 0 |
| Distinct categories | 0 ? |
| Distinct scripts | 0 ? |
| Distinct blocks | 0 ? |
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | 0.0 |
|---|---|
| 2nd row | 0.0 |
| 3rd row | 1.0 |
| 4th row | 0.0 |
| 5th row | 0.0 |
Common Values
| Value | Count | Frequency (%) |
| 0.0 | 2385 | |
| 2.0 | 916 | 23.9% |
| 1.0 | 401 | 10.5% |
| 3.0 | 66 | 1.7% |
| 4.0 | 64 | 1.7% |
Length
Histogram of lengths of the category
Pie chart
| Value | Count | Frequency (%) |
| 0.0 | 2385 | |
| 2.0 | 916 | 23.9% |
| 1.0 | 401 | 10.5% |
| 3.0 | 66 | 1.7% |
| 4.0 | 64 | 1.7% |
Most occurring characters
| Value | Count | Frequency (%) |
| No values found. | ||
Most occurring categories
| Value | Count | Frequency (%) |
| No values found. | ||
Most frequent character per category
Most occurring scripts
| Value | Count | Frequency (%) |
| No values found. | ||
Most frequent character per script
Most occurring blocks
| Value | Count | Frequency (%) |
| No values found. | ||
Most frequent character per block
| Distinct | 6 |
|---|---|
| Distinct (%) | 0.2% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 4.717379958 |
| Minimum | 0 |
|---|---|
| Maximum | 5 |
| Zeros | 53 |
| Zeros (%) | 1.4% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 30.1 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 2 |
| Q1 | 5 |
| median | 5 |
| Q3 | 5 |
| 95-th percentile | 5 |
| Maximum | 5 |
| Range | 5 |
| Interquartile range (IQR) | 0 |
Descriptive statistics
| Standard deviation | 0.9540366772 |
|---|---|
| Coefficient of variation (CV) | 0.2022386761 |
| Kurtosis | 11.48611069 |
| Mean | 4.717379958 |
| Median Absolute Deviation (MAD) | 0 |
| Skewness | -3.505073471 |
| Sum | 18077 |
| Variance | 0.9101859814 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=6)
| Value | Count | Frequency (%) |
| 5 | 3463 | |
| 2 | 140 | 3.7% |
| 4 | 76 | 2.0% |
| 1 | 61 | 1.6% |
| 0 | 53 | 1.4% |
| 3 | 39 | 1.0% |
| Value | Count | Frequency (%) |
| 0 | 53 | 1.4% |
| 1 | 61 | 1.6% |
| 2 | 140 | 3.7% |
| 3 | 39 | 1.0% |
| 4 | 76 | 2.0% |
| 5 | 3463 |
| Value | Count | Frequency (%) |
| 5 | 3463 | |
| 4 | 76 | 2.0% |
| 3 | 39 | 1.0% |
| 2 | 140 | 3.7% |
| 1 | 61 | 1.6% |
| 0 | 53 | 1.4% |
| Distinct | 22 |
|---|---|
| Distinct (%) | 0.6% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 12.84916493 |
| Minimum | 0 |
|---|---|
| Maximum | 21 |
| Zeros | 107 |
| Zeros (%) | 2.8% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 30.1 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 1 |
| Q1 | 7 |
| median | 14 |
| Q3 | 18 |
| 95-th percentile | 21 |
| Maximum | 21 |
| Range | 21 |
| Interquartile range (IQR) | 11 |
Descriptive statistics
| Standard deviation | 6.579093656 |
|---|---|
| Coefficient of variation (CV) | 0.5120249988 |
| Kurtosis | -0.9627078369 |
| Mean | 12.84916493 |
| Median Absolute Deviation (MAD) | 5 |
| Skewness | -0.4988774109 |
| Sum | 49238 |
| Variance | 43.28447333 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=22)
| Value | Count | Frequency (%) |
| 21 | 623 | |
| 15 | 302 | 7.9% |
| 14 | 290 | 7.6% |
| 13 | 260 | 6.8% |
| 16 | 239 | 6.2% |
| 11 | 227 | 5.9% |
| 1 | 201 | 5.2% |
| 19 | 199 | 5.2% |
| 17 | 187 | 4.9% |
| 18 | 185 | 4.8% |
| Other values (12) | 1119 |
| Value | Count | Frequency (%) |
| 0 | 107 | |
| 1 | 201 | |
| 2 | 130 | |
| 3 | 101 | |
| 4 | 87 | |
| 5 | 101 | |
| 6 | 148 | |
| 7 | 119 | |
| 8 | 72 | 1.9% |
| 9 | 50 | 1.3% |
| Value | Count | Frequency (%) |
| 21 | 623 | |
| 20 | 113 | 2.9% |
| 19 | 199 | 5.2% |
| 18 | 185 | 4.8% |
| 17 | 187 | 4.9% |
| 16 | 239 | 6.2% |
| 15 | 302 | |
| 14 | 290 | |
| 13 | 260 | |
| 12 | 20 | 0.5% |
| Distinct | 8 |
|---|---|
| Distinct (%) | 0.2% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 3.158663883 |
| Minimum | 0 |
|---|---|
| Maximum | 7 |
| Zeros | 371 |
| Zeros (%) | 9.7% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 30.1 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 1 |
| median | 3 |
| Q3 | 4 |
| 95-th percentile | 7 |
| Maximum | 7 |
| Range | 7 |
| Interquartile range (IQR) | 3 |
Descriptive statistics
| Standard deviation | 2.028479838 |
|---|---|
| Coefficient of variation (CV) | 0.6421955335 |
| Kurtosis | -0.7877589264 |
| Mean | 3.158663883 |
| Median Absolute Deviation (MAD) | 1 |
| Skewness | 0.2345289145 |
| Sum | 12104 |
| Variance | 4.114730451 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=8)
| Value | Count | Frequency (%) |
| 4 | 908 | |
| 1 | 656 | |
| 3 | 597 | |
| 2 | 458 | |
| 0 | 371 | |
| 7 | 334 | 8.7% |
| 5 | 277 | 7.2% |
| 6 | 231 | 6.0% |
| Value | Count | Frequency (%) |
| 0 | 371 | |
| 1 | 656 | |
| 2 | 458 | |
| 3 | 597 | |
| 4 | 908 | |
| 5 | 277 | 7.2% |
| 6 | 231 | 6.0% |
| 7 | 334 | 8.7% |
| Value | Count | Frequency (%) |
| 7 | 334 | 8.7% |
| 6 | 231 | 6.0% |
| 5 | 277 | 7.2% |
| 4 | 908 | |
| 3 | 597 | |
| 2 | 458 | |
| 1 | 656 | |
| 0 | 371 |
| Distinct | 6 |
|---|---|
| Distinct (%) | 0.2% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 4.393267223 |
| Minimum | 0 |
|---|---|
| Maximum | 5 |
| Zeros | 137 |
| Zeros (%) | 3.6% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 30.1 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 1 |
| Q1 | 5 |
| median | 5 |
| Q3 | 5 |
| 95-th percentile | 5 |
| Maximum | 5 |
| Range | 5 |
| Interquartile range (IQR) | 0 |
Descriptive statistics
| Standard deviation | 1.336821295 |
|---|---|
| Coefficient of variation (CV) | 0.304288637 |
| Kurtosis | 3.734393113 |
| Mean | 4.393267223 |
| Median Absolute Deviation (MAD) | 0 |
| Skewness | -2.251681479 |
| Sum | 16835 |
| Variance | 1.787091176 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=6)
| Value | Count | Frequency (%) |
| 5 | 2925 | |
| 4 | 423 | 11.0% |
| 1 | 203 | 5.3% |
| 0 | 137 | 3.6% |
| 2 | 117 | 3.1% |
| 3 | 27 | 0.7% |
| Value | Count | Frequency (%) |
| 0 | 137 | 3.6% |
| 1 | 203 | 5.3% |
| 2 | 117 | 3.1% |
| 3 | 27 | 0.7% |
| 4 | 423 | 11.0% |
| 5 | 2925 |
| Value | Count | Frequency (%) |
| 5 | 2925 | |
| 4 | 423 | 11.0% |
| 3 | 27 | 0.7% |
| 2 | 117 | 3.1% |
| 1 | 203 | 5.3% |
| 0 | 137 | 3.6% |
| Distinct | 6 |
|---|---|
| Distinct (%) | 0.2% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 1.740344468 |
| Minimum | 0 |
|---|---|
| Maximum | 5 |
| Zeros | 1699 |
| Zeros (%) | 44.3% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 30.1 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 0 |
| median | 1 |
| Q3 | 4 |
| 95-th percentile | 5 |
| Maximum | 5 |
| Range | 5 |
| Interquartile range (IQR) | 4 |
Descriptive statistics
| Standard deviation | 1.934765045 |
|---|---|
| Coefficient of variation (CV) | 1.111713848 |
| Kurtosis | -1.353111525 |
| Mean | 1.740344468 |
| Median Absolute Deviation (MAD) | 1 |
| Skewness | 0.5656052695 |
| Sum | 6669 |
| Variance | 3.743315778 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=6)
| Value | Count | Frequency (%) |
| 0 | 1699 | |
| 4 | 662 | 17.3% |
| 1 | 580 | 15.1% |
| 5 | 486 | 12.7% |
| 2 | 204 | 5.3% |
| 3 | 201 | 5.2% |
| Value | Count | Frequency (%) |
| 0 | 1699 | |
| 1 | 580 | 15.1% |
| 2 | 204 | 5.3% |
| 3 | 201 | 5.2% |
| 4 | 662 | 17.3% |
| 5 | 486 | 12.7% |
| Value | Count | Frequency (%) |
| 5 | 486 | 12.7% |
| 4 | 662 | 17.3% |
| 3 | 201 | 5.2% |
| 2 | 204 | 5.3% |
| 1 | 580 | 15.1% |
| 0 | 1699 |
training_hours
Real number (ℝ≥0)
| Distinct | 232 |
|---|---|
| Distinct (%) | 6.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 61.63726514 |
| Minimum | 0 |
|---|---|
| Maximum | 240 |
| Zeros | 5 |
| Zeros (%) | 0.1% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 30.1 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 7 |
| Q1 | 22 |
| median | 46 |
| Q3 | 87 |
| 95-th percentile | 171 |
| Maximum | 240 |
| Range | 240 |
| Interquartile range (IQR) | 65 |
Descriptive statistics
| Standard deviation | 51.64722767 |
|---|---|
| Coefficient of variation (CV) | 0.8379221167 |
| Kurtosis | 1.147776109 |
| Mean | 61.63726514 |
| Median Absolute Deviation (MAD) | 29 |
| Skewness | 1.263712933 |
| Sum | 236194 |
| Variance | 2667.436126 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 16 | 66 | 1.7% |
| 11 | 64 | 1.7% |
| 27 | 63 | 1.6% |
| 41 | 62 | 1.6% |
| 21 | 62 | 1.6% |
| 19 | 60 | 1.6% |
| 23 | 58 | 1.5% |
| 49 | 55 | 1.4% |
| 25 | 53 | 1.4% |
| 10 | 52 | 1.4% |
| Other values (222) | 3237 |
| Value | Count | Frequency (%) |
| 0 | 5 | 0.1% |
| 1 | 14 | 0.4% |
| 2 | 18 | 0.5% |
| 3 | 37 | |
| 4 | 22 | |
| 5 | 45 | |
| 6 | 48 | |
| 7 | 43 | |
| 8 | 50 | |
| 9 | 48 |
| Value | Count | Frequency (%) |
| 240 | 4 | |
| 238 | 2 | 0.1% |
| 237 | 2 | 0.1% |
| 236 | 1 | < 0.1% |
| 235 | 5 | |
| 233 | 2 | 0.1% |
| 232 | 3 | |
| 231 | 4 | |
| 230 | 4 | |
| 229 | 4 |
Spearman's ρ
The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
Pearson's r
The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
Kendall's τ
Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
Cramér's V (φc)
Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.Phik (φk)
Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here. A simple visualization of nullity by column.
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
First rows
| df_index | city | city_development_index | gender | relevent_experience | enrolled_university | education_level | major_discipline | experience | company_size | company_type | last_new_job | training_hours | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 686 | 103 | 91 | 1.0 | 0 | 2.0 | 0.0 | 5.0 | 3.0 | 2.0 | 5.0 | 0.0 | 41 |
| 1 | 1106 | 116 | 27 | 0.0 | 1 | 0.0 | 0.0 | 5.0 | 11.0 | 5.0 | 5.0 | 5.0 | 17 |
| 2 | 8966 | 103 | 91 | 1.0 | 0 | 2.0 | 1.0 | 5.0 | 15.0 | 7.0 | 5.0 | 0.0 | 21 |
| 3 | 7961 | 38 | 63 | 1.0 | 0 | 2.0 | 0.0 | 5.0 | 1.0 | 2.0 | 4.0 | 0.0 | 77 |
| 4 | 5182 | 64 | 14 | 1.0 | 0 | 2.0 | 0.0 | 5.0 | 21.0 | 7.0 | 5.0 | 4.0 | 77 |
| 5 | 14207 | 79 | 12 | 0.0 | 1 | 0.0 | 0.0 | 5.0 | 14.0 | 5.0 | 5.0 | 1.0 | 144 |
| 6 | 14783 | 5 | 85 | 1.0 | 0 | 2.0 | 0.0 | 5.0 | 14.0 | 1.0 | 5.0 | 0.0 | 33 |
| 7 | 821 | 49 | 85 | 1.0 | 0 | 2.0 | 4.0 | 5.0 | 19.0 | 7.0 | 2.0 | 1.0 | 153 |
| 8 | 5762 | 5 | 85 | 1.0 | 1 | 1.0 | 1.0 | 5.0 | 11.0 | 5.0 | 5.0 | 0.0 | 46 |
| 9 | 16250 | 13 | 90 | 1.0 | 0 | 2.0 | 1.0 | 5.0 | 21.0 | 1.0 | 5.0 | 4.0 | 15 |
Last rows
| df_index | city | city_development_index | gender | relevent_experience | enrolled_university | education_level | major_discipline | experience | company_size | company_type | last_new_job | training_hours | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 3822 | 18759 | 2 | 73 | 1.0 | 0 | 1.0 | 0.0 | 5.0 | 18.0 | 4.0 | 5.0 | 0.0 | 27 |
| 3823 | 16881 | 5 | 85 | 1.0 | 0 | 2.0 | 0.0 | 5.0 | 21.0 | 3.0 | 2.0 | 1.0 | 197 |
| 3824 | 11645 | 17 | 29 | 1.0 | 0 | 1.0 | 2.0 | 5.0 | 1.0 | 2.0 | 5.0 | 2.0 | 18 |
| 3825 | 10072 | 64 | 14 | 1.0 | 0 | 2.0 | 0.0 | 5.0 | 17.0 | 2.0 | 5.0 | 0.0 | 62 |
| 3826 | 11086 | 5 | 85 | 1.0 | 0 | 2.0 | 0.0 | 3.0 | 21.0 | 2.0 | 5.0 | 4.0 | 43 |
| 3827 | 9055 | 5 | 85 | 1.0 | 1 | 2.0 | 0.0 | 5.0 | 13.0 | 2.0 | 5.0 | 0.0 | 171 |
| 3828 | 11207 | 93 | 21 | 1.0 | 0 | 0.0 | 0.0 | 5.0 | 21.0 | 3.0 | 5.0 | 0.0 | 7 |
| 3829 | 16790 | 5 | 85 | 1.0 | 1 | 2.0 | 0.0 | 5.0 | 6.0 | 5.0 | 4.0 | 2.0 | 217 |
| 3830 | 5575 | 70 | 91 | 1.0 | 1 | 0.0 | 1.0 | 5.0 | 11.0 | 3.0 | 5.0 | 5.0 | 180 |
| 3831 | 10387 | 13 | 90 | 1.0 | 0 | 2.0 | 2.0 | 5.0 | 21.0 | 1.0 | 5.0 | 4.0 | 71 |